Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition

نویسندگان

  • György Kovács
  • László Tóth
چکیده

In speech recognition, feature extraction and acoustical model training are traditionally done in two separate steps. Here, instead, we use a framework that combines spectro-temporal feature extraction and the training of neural network based acoustic models into a single process. We found earlier that this approach can be successfully applied for the recognition of speech. In this paper, we propose two further improvements to our method based on recent advances in neural net technology and extend our evaluation to speech conatminated with new types of noise. By repeating our experiments on TIMIT phone recognition tasks using clean and noise contaminated speech, we can compare the recognition performance of the original framework with our new, modified framework. The results indicate that both these modifications significantly improve the recognition performance of our framework. Moreover, we will show that these modifications allow us to achieve a substantially better performance than what we got earlier.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Informative spectro-temporal bottleneck features for noise-robust speech recognition

Spectro-temporal Gabor features based on auditory knowledge have improved word accuracy for automatic speech recognition in the presence of noise. In previous work, we generated robust spectro-temporal features that incorporated the power normalized cepstral coefficient (PNCC) algorithm. The corresponding power normalized spectrum (PNS) is then processed by many Gabor filters, yielding a high d...

متن کامل

Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features

Spectro-temporal feature extraction and multi-band processing were both designed to make the speech recognizers more robust. Although they have been used for a long time now, very few attempts have been made to combine them. This is why here we integrate two spectrotemporal feature extraction methods into a multi-band framework. We assess the performance of our spectro-temporal feature sets bot...

متن کامل

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Acta Cybern.

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2015